Markov Chains and Hidden Markov Models

George Tzanetakis, University of Victoria

In this notebook we will explore hidden markov models. We start with random variables and a simple independent, identically distributed model for weather. Then we look into how to form a Markov Chain to transition between states and finally we sample a Hidden Markov Model to show how the samples are generated based on the Markov Chain of the hidden states. The results are visualized as strips of colored rectangles. Experiments with the transition probabilities and the emission probabilities can lead to better understanding of how Hidden Markov Models work in terms of generating data.

The notebook ends by creating by hand a simple discrete HMM for generating chords. The results are displayed as a score as well as played.

The usual helper Random Variable class


In [3]:
%matplotlib inline 
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np
from hmmlearn import hmm



class Random_Variable: 
    
    def __init__(self, name, values, probability_distribution): 
        self.name = name 
        self.values = values 
        self.probability_distribution = probability_distribution 
        if all(type(item) is np.int64 for item in values): 
            self.type = 'numeric'
            self.rv = stats.rv_discrete(name = name, values = (values, probability_distribution))
        elif all(type(item) is str for item in values): 
            self.type = 'symbolic'
            self.rv = stats.rv_discrete(name = name, values = (np.arange(len(values)), probability_distribution))
            self.symbolic_values = values 
        else: 
            self.type = 'undefined'
            
    def sample(self,size): 
        if (self.type =='numeric'): 
            return self.rv.rvs(size=size)
        elif (self.type == 'symbolic'): 
            numeric_samples = self.rv.rvs(size=size)
            mapped_samples = [self.values[x] for x in numeric_samples]
            return mapped_samples 
        
    def probs(self): 
        return self.probability_distribution
    
    def vals(self): 
        print(self.type)
        return self.values

Generating random weather samples with a IID model with no time dependencies

Let's first create some random samples of a symbolic random variable corresponding to the weather with two values Sunny (S) and cloudy (C) and generate random weather for 365 days. The assumption in this model is that the weather of each day is indepedent of the previous days and drawn from the same probability distribution.


In [8]:
values = ['S', 'C']
probabilities = [0.5, 0.5]
weather = Random_Variable('weather', values, probabilities)
samples = weather.sample(365)
print(",".join(samples))


S,S,C,C,S,S,S,C,S,C,S,C,C,S,C,S,C,S,C,C,S,S,C,S,C,S,S,S,C,C,C,C,S,C,S,S,C,C,C,C,C,S,C,S,S,C,S,C,S,C,S,C,C,S,S,S,C,S,S,S,S,S,C,C,S,S,C,C,C,S,C,S,C,S,C,C,C,S,S,S,S,S,C,S,S,C,C,S,C,C,S,S,C,C,S,S,C,S,C,S,S,S,C,C,C,S,S,S,C,S,C,C,S,S,C,C,S,C,S,C,C,C,S,C,S,S,C,S,S,S,C,C,C,C,C,C,S,C,S,C,S,S,C,C,C,S,C,C,S,C,C,S,S,S,S,S,S,C,C,C,S,C,C,S,S,S,C,S,S,S,C,S,C,C,C,S,C,C,C,C,C,C,C,S,S,S,S,C,C,C,S,S,C,S,C,S,C,C,C,C,S,C,S,S,S,C,S,C,S,C,C,C,S,C,S,S,C,S,S,C,C,S,C,C,S,C,C,S,S,S,C,C,S,C,S,C,S,C,S,S,S,C,C,C,C,S,C,S,C,C,C,C,S,C,C,C,C,C,S,C,S,S,C,S,S,S,C,C,C,C,C,C,S,C,C,S,S,C,C,C,S,C,C,C,C,S,S,C,S,S,S,S,C,C,S,C,C,C,S,C,C,S,S,S,C,S,C,C,S,S,S,S,C,S,S,C,S,C,C,C,S,C,C,C,S,S,C,S,C,C,S,S,C,S,C,C,C,S,C,C,C,C,C,C,C,S,C,C,S,S,S,S,C,S,C,C,C,C,S,C,C,S,C,C,S

Now let lets visualize these samples using yellow for sunny and grey for cloudy


In [16]:
state2color = {} 
state2color['S'] = 'yellow'
state2color['C'] = 'grey'

def plot_weather_samples(samples, state2color, title): 
    colors = [state2color[x] for x in samples]
    x = np.arange(0, len(colors))
    y = np.ones(len(colors))
    plt.figure(figsize=(10,1))
    plt.bar(x, y, color=colors, width=1)
    plt.title(title)
    
plot_weather_samples(samples, state2color, 'iid')


Markov Chain

Now instead of independently sampling the weather random variable lets form a markov chain. The Markov chain will start at a particular state and then will either stay in the same state or transition to a different state based on a transition probability matrix. To accomplish that we basically create a random variable for each row of the transition matrix that basically corresponds to the probabilities of the transitions emanating fromt the state corresponding to that row. Then we can use the markov chain to generate sequences of samples and contrast these sequence with the iid weather model. By adjusting the transition probabilities you can in a probabilistic way control the different lengths of "stretches" of the same state.


In [17]:
def markov_chain(transmat, state, state_names, samples): 
    (rows, cols) = transmat.shape 
    rvs = [] 
    values = list(np.arange(0,rows))
    
    # create random variables for each row of transition matrix 
    for r in range(rows): 
        rv = Random_Variable("row" + str(r), values, transmat[r])
        rvs.append(rv)
    
    # start from initial state and then sample the appropriate 
    # random variable based on the state following the transitions 
    states = [] 
    for n in range(samples): 
        state = rvs[state].sample(1)[0]    
        states.append(state_names[state])
    return states


# transition matrices for the Markov Chain 
transmat1 = np.array([[0.7, 0.3], 
                    [0.2, 0.8]])

transmat2 = np.array([[0.9, 0.1], 
                    [0.1, 0.9]])

transmat3 = np.array([[0.5, 0.5], 
                     [0.5, 0.5]])

state2color = {} 
state2color['S'] = 'yellow'
state2color['C'] = 'grey'

# plot the iid model too
samples = weather.sample(365)
plot_weather_samples(samples, state2color, 'iid')

samples1 = markov_chain(transmat1,0,['S','C'], 365)
plot_weather_samples(samples1, state2color, 'markov chain 1')

samples2 = markov_chain(transmat2,0,['S','C'],365)
plot_weather_samples(samples2, state2color, 'marov_chain 2')

samples3 = markov_chain(transmat3,0,['S','C'], 365)
plot_weather_samples(samples3, state2color, 'markov_chain 3')


Note: Look back at the Random Variables notebook for an example of generating melodies using a Markov Chain with a transition probability matrix that is calculated by analyzing a corpus of chorales.

Generating samples using a Hidden Markov Model

Lets now look at how a Hidden Markov Model would work by having a Markov Chain to generate a sequence of states and for each state having a different emission probability. When sunny we will output red or yellow with higher probabilities and when cloudy black or blue. First we will write the code directly and then we will use the hmmlearn package.


In [34]:
state2color = {} 
state2color['S'] = 'yellow'
state2color['C'] = 'grey'

# generate random samples for a year 
samples = weather.sample(365)
states = markov_chain(transmat1,0,['S','C'], 365)
plot_weather_samples(states, state2color, "markov chain 1")

# create two random variables one of the sunny state and one for the cloudy 
sunny_colors = Random_Variable('sunny_colors', ['y', 'r', 'b', 'g'], 
                              [0.6, 0.3, 0.1, 0.0])
cloudy_colors = Random_Variable('cloudy_colors', ['y', 'r', 'b', 'g'], 
                               [0.0, 0.1, 0.4, 0.5])

def emit_obs(state, sunny_colors, cloudy_colors): 
    if (state == 'S'): 
        obs = sunny_colors.sample(1)[0]
    else: 
        obs = cloudy_colors.sample(1)[0]
    return obs 

# iterate over the sequence of states and emit color based on the emission probabilities 
obs = [emit_sample(s, sunny_colors, cloudy_colors) for s in states]

obs2color = {} 
obs2color['y'] = 'yellow'
obs2color['r'] = 'red'
obs2color['b'] = 'blue'
obs2color['g'] = 'grey'
plot_weather_samples(obs, obs2color, "Observed sky color")

# let's zoom in a month 
plot_weather_samples(states[0:30], state2color, 'states for a month')
plot_weather_samples(obs[0:30], obs2color, 'observations for a month')


Multinomial HMM

Lets do the same generation process using the multinomail HMM model supported by the hmmlearn python package.


In [41]:
transmat = np.array([[0.7, 0.3], 
                    [0.2, 0.8]])

start_prob = np.array([1.0, 0.0, 0.0])

# yellow and red have high probs for sunny 
# blue and grey have high probs for cloudy 
emission_probs = np.array([[0.6, 0.3, 0.1, 0.0], 
                           [0.0, 0.1, 0.4, 0.5]])

model = hmm.MultinomialHMM(n_components=2)
model.startprob_ = start_prob 
model.transmat_ = transmat 
model.emissionprob_ = emission_probs

# sample the model - X is the observed values 
# and Z is the "hidden" states 
X, Z = model.sample(365)

# we have to re-define state2color and obj2color as the hmm-learn 
# package just outputs numbers for the states 
state2color = {} 
state2color[0] = 'yellow'
state2color[1] = 'grey'
plot_weather_samples(Z, state2color, 'states')

samples = [item for sublist in X for item in sublist]
obj2color = {} 
obj2color[0] = 'yellow'
obj2color[1] = 'red'
obj2color[2] = 'blue'
obj2color[3] = 'grey'
plot_weather_samples(samples, obj2color, 'observations')


Estimating the parameters of an HMM

Let's sample the generative HMM and get a sequence of 1000 observations. Now we can learn in an unsupervised way the paraemters of a two component multinomial HMM just using these observations. Then we can compare the learned parameters with the original parameters of the model used to generate the observations. Notice that the order of the components is different between the original and estimated models. Notice that hmmlearn does NOT directly support supervised training where you have both the labels and observations. It is possible to initialize a HMM model with some of the parameters and learn the others. For example you can initialize the transition matrix and learn the emission probabilities. That way you could implement supervised learning for a multinomial HMM. In many practical applications the hidden labels are not available and that's the hard case that is implemented.


In [43]:
# generate the samples 
X, Z = model.sample(1000)
# learn a new model 
estimated_model = hmm.MultinomialHMM(n_components=2, n_iter=10000).fit(X)

Let's compare the estimated model parameters with the original model.


In [44]:
print("Transition matrix")
print("Estimated model:")
print(estimated_model.transmat_)
print("Original model:")
print(model.transmat_)
print("Emission probabilities")
print("Estimated model")
print(estimated_model.emissionprob_)
print("Original model")
print(model.emissionprob_)


Transition matrix
Estimated model:
[[ 0.81106942  0.18893058]
 [ 0.30524822  0.69475178]]
Original model:
[[ 0.7  0.3]
 [ 0.2  0.8]]
Emission probabilities
Estimated model
[[  5.07058360e-04   7.40903942e-02   3.97953839e-01   5.27448708e-01]
 [  6.08635509e-01   3.39512254e-01   5.16777944e-02   1.74442447e-04]]
Original model
[[ 0.6  0.3  0.1  0. ]
 [ 0.   0.1  0.4  0.5]]

Predicting a sequence of states given a sequence of observations

We can also use the trained HMM model to predict a sequence of hidden states given a sequence of observations. This is the tasks of maximum likelihood sequence estimation. For example in Speech Recognition it would correspond to estimating a sequence of phonemes (hidden states) from a sequence of observations (acoustic vectors).


In [48]:
Z2 = estimated_model.predict(X)
state2color = {} 
state2color[0] = 'yellow'
state2color[1] = 'grey'
plot_weather_samples(Z, state2color, 'Original states')
plot_weather_samples(Z2, state2color, 'Predicted states')

# note the reversal of colors for the states as the order of components is not the same. 
# we can easily fix this by change the state2color 
state2color = {} 
state2color[1] = 'yellow'
state2color[0] = 'grey'
plot_weather_samples(Z2, state2color, 'Flipped Predicted states')


The estimated model can be sampled just like the original model


In [52]:
X, Z = estimated_model.sample(365)

state2color = {} 
state2color[0] = 'yellow'
state2color[1] = 'grey'
plot_weather_samples(Z, state2color, 'states generated by estimated model ')

samples = [item for sublist in X for item in sublist]
obs2color = {} 
obs2color[0] = 'yellow'
obs2color[1] = 'red'
obs2color[2] = 'blue'
obs2color[3] = 'grey'
plot_weather_samples(samples, obs2color, 'observations generated by estimated model')


A HMM example using Chords

Let's do pretend music example by having the states model a chord progression consisting of D (II), G(V), C (I) chords and the observations consist of chord type i.e whether they are minor7, major7, or dominant7. This is an extremely simplified model of how harmony works but it does do a bit better than random.


In [71]:
# probabities of each state D(II), G(V), C(I). The transitions are semi-plausible but set by hand. 
# in a full problem they would be learned from data 
transmat = np.array([[0.4, 0.4, 0.2], 
                    [0.1, 0.1, 0.8], 
                    [0.0, 0.3, 0.7]])

start_prob = np.array([1.0, 0.0, 0.0])

# the emission probabilities are also set by hand and semi-plausible and correspond 
# to the probability that a chord is dominant, minor or major 7th. Notice for example 
# that if the chord is a C(I) (the third row then it will never be a dominant chord the 
# last 0.0 in that row 
emission_probs = np.array([[0.4, 0.0, 0.4], 
                           [0.3, 0.3, 0.3],
                           [0.2, 0.8, 0.0]]) 
                          

chord_model = hmm.MultinomialHMM(n_components=2)
chord_model.startprob_ = start_prob 
chord_model.transmat_ = transmat 
chord_model.emissionprob_ = emission_probs

X, Z = chord_model.sample(10)
state2name = {} 
state2name[0] = 'D'
state2name[1] = 'G'
state2name[2] = 'C'
chords = [state2name[state] for state in Z]
print(chords)

obj2name = {}
obj2name[0] = 'min7'
obj2name[1] = 'maj7'
obj2name[2] = '7'
observations = [obj2name[item] for sublist in X for item in sublist]
print(observations)

chords = [''.join(chord) for chord in zip(chords,observations)]
print(chords)


['D', 'G', 'C', 'G', 'C', 'C', 'C', 'C', 'C', 'C']
['7', 'min7', 'maj7', '7', 'maj7', 'maj7', 'maj7', 'min7', 'maj7', 'maj7']
['D7', 'Gmin7', 'Cmaj7', 'G7', 'Cmaj7', 'Cmaj7', 'Cmaj7', 'Cmin7', 'Cmaj7', 'Cmaj7']

Playing back the generated chords

Now that we have generated a sequence of chords symbols with a little bit of work we can play it back using Music21


In [72]:
from music21 import *

# create some chords for II, V, I 
d7 = chord.Chord(['D4','F4', 'A4', 'C5'])
dmin7 = chord.Chord(['D4','F-4', 'A4', 'C5'])
dmaj7 = chord.Chord(['D4','F#4', 'A4', 'C#5'])

c7 = d7.transpose(-2)
cmin7 = dmin7.transpose(-2)
cmaj7 = dmaj7.transpose(-2)

g7 = d7.transpose(5)
gmin7 = dmin7.transpose(5)
gmaj7 = dmaj7.transpose(5)
print(g7.pitches)

stream1 = stream.Stream()
stream1.repeatAppend(dmin7,1)
stream1.repeatAppend(g7,1)
stream1.repeatAppend(cmaj7,1)
stream1.repeatAppend(cmaj7,1)
print(stream1)

name2chord = {} 
name2chord['C7'] = c7 
name2chord['Cmin7'] = cmin7 
name2chord['Cmaj7'] = cmaj7

name2chord['D7'] = d7 
name2chord['Dmin7'] = dmin7 
name2chord['Dmaj7'] = dmaj7

name2chord['G7'] = g7 
name2chord['Gmin7'] = gmin7 
name2chord['Gmaj7'] = gmaj7


hmm_chords = stream.Stream() 
for c in chords: 
    hmm_chords.repeatAppend(name2chord[c],1)


(<music21.pitch.Pitch G4>, <music21.pitch.Pitch B-4>, <music21.pitch.Pitch D5>, <music21.pitch.Pitch F5>)
<music21.stream.Stream 0x111629978>

In [74]:
# let's check that we can play streams of chords 
#sp = midi.realtime.StreamPlayer(stream1)
#sp.play()

# let's now play a hidden markov model generated chord sequence
print(chords)
hmm_chords.show()
sp = midi.realtime.StreamPlayer(hmm_chords)
sp.play()


['D7', 'Gmin7', 'Cmaj7', 'G7', 'Cmaj7', 'Cmaj7', 'Cmaj7', 'Cmin7', 'Cmaj7', 'Cmaj7']